Computing Exact p-Value for Structured Motif
نویسندگان
چکیده
Extracting motifs from a set of DNA sequences is important in computational biology. Occurrence probability is a common used statistics to evaluate the statistical significance of a motif. A main problem is how to calculate the occurrence probability of the motif on the random model of DNA sequence efficiently and accurately. In this paper, we are interested in a particular motif model which is useful in transcription process. This motif, which is called structured motif, is composed two motif words on single nucleotide alphabet and with fixed spacers between them. We present an efficient algorithm to calculate the exact occurrence probability of a structured motif on a given sequence. It is the first non-trivial algorithm to calculate the exact p-value for such kind of motifs.
منابع مشابه
Computing exact p - values for DNA motifs ( Part I )
Motivation: Many heuristic algorithms have been designed to approximate p-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true p-value by orders of magnitude. Exact p-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the
متن کاملSESOS: A Verifiable Searchable Outsourcing Scheme for Ordered Structured Data in Cloud Computing
While cloud computing is growing at a remarkable speed, privacy issues are far from being solved. One way to diminish privacy concerns is to store data on the cloud in encrypted form. However, encryption often hinders useful computation cloud services. A theoretical approach is to employ the so-called fully homomorphic encryption, yet the overhead is so high that it is not considered a viable s...
متن کاملEfficient exact motif discovery
MOTIVATION The motif discovery problem consists of finding over-represented patterns in a collection of biosequences. It is one of the classical sequence analysis problems, but still has not been satisfactorily solved in an exact and efficient manner. This is partly due to the large number of possibilities of defining the motif search space and the notion of over-representation. Even for well-d...
متن کاملExploring the Combinatorics of Motif Alignments Foraccurately Computing E-values from P-values
In biological and biomedical research motif finding tools are important in locating regulatory elements in DNA sequences. There are many such motif finding tools available, which often yield position weight matrices and significance indicators. These indicators, p-values and E-values, describe the likelihood that a motif alignment is generated by the background process, and the expected number ...
متن کاملExploring Highly Structure Similar Protein Sequence Motifs using SVD with Soft Granular Computing Models
Vital areas in Bioinformatics research is one of the Protein sequence analysis. Protein sequence motifs are determining the structure, function, and activities of the particular protein. The main objective of this paper is to obtain protein sequence motifs which are universally conserved across protein family boundaries. In this research, the input dataset is extremely large. Hence, an efficien...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007